Discovering the topics of a data source: a statistical approach

نویسندگان

  • Sonia Bergamaschi
  • Davide Ferrari
  • Francesco Guerra
  • Giovanni Simonini
چکیده

In this paper, we present a preliminary approach for automatically discovering the topics of a structured data source with respect to a reference ontology. Our technique relies on a signature, i.e., a weighted graph that summarizes the content of a source. Graph-based approaches have been already used in the literature for similar purposes. In these proposals, the weights are typically assigned using traditional information-theoretical quantities such as entropy and mutual information. Here, we propose a novel data-driven technique based on composite likelihood to estimate the weights and other main features of the graphs, making the resulting approach less sensitive to overfitting. By means of a comparison of signatures, we can easily discover the topic of a target data source with respect to a reference ontology. This task is provided by a matching algorithm that retrieves the elements common to both the graphs. To illustrate our approach, we discuss a preliminary evaluation in the form of running example.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A review of text mining approaches and their function in discovering and extracting a topic

Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling.  Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...

متن کامل

Classification and Comparison of Methods for Discovering Coverage Loss Areas in Wireless Sensor Networks

In recent years, wireless sensor networks data is taken into consideration as an ideal source, in terms of speed, accuracy and cost, in order to study the Earth's surface. One of the most important challenges in this area, is the signaling network coverage and finding holes. In recent years, wireless sensor networks data is taken into consideration as an ideal source, in terms of speed, accurac...

متن کامل

A New Approach to Introducing Minimum Learning Requirements in Internal and Surgical Emergencies during General Medical Education

Introduction: In order to adjust medical students’ education with their professional needs, the educational managers in Isfahan Medical University decided to design a specific course for teaching Emergency Medicine. This study was done to determine the viewpoints of experts concerning minimum educational needs in emergency departments during general medical education. Methods: This cross-secti...

متن کامل

Designing Syllabus for Islamic Education Courses with a Health-Oriented Approach in Isfahan university of Medical Sciences

Introduction: Providing Islamic education courses with a health-oriented approach for medical students can be a source of dynamism and efficiency of these courses. The purpose of this study was to investigate the feasibility of presenting Islamic education courses with a health-oriented approach and to compile new topics for these courses. Methods: This research was conducted with a development...

متن کامل

Discovering Emerging Topics from WWW

Discovering emerging topics from WWW has been attracting attention of business professionals, especially marketing researchers. For this purpose, WWW can be a valuable source of information because it reflects the dynamics of human society. In this paper we aim at revealing the structure of WWW by using KeyGraph, a visualization method of hidden structure behind data, for understanding emerging...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014